Lending Club Loan Data: Loan Book Distribution ("Bokeh" Viz)

Description: Analyze Lending Club's issued loans

These files contain complete loan data for all loans issued through the 2007-2015, including the current loan status ('Current', 'Late', 'Fully Paid', etc.) and latest payment information. Additional features include credit scores, number of finance inquiries, address including zip codes, and state, and collections among others. The file is a matrix of about 890 thousand observations and 75 variables. Here, we use a previously transformed data set, which is however a full copy of the original one. For more information, or if you want to download these data, consult:

Source

1. Loading Libraries and Data Sets


In [1]:
# Required Libraries
import os
import pandas as pd
import numpy as np

In [2]:
# Path Definitions of Required Data Sets
loan_df_path = os.path.join('/media/ML_HOME/ML-Data_Repository/data', 'loan_df')
us_states_GeoJSON = os.path.join('/media/ML_HOME/ML-Data_Repository/maps', 'us_states-albersUSA-Geo.json')

2. Loan Book Distribution across the U.S. States (D3 Choropleths by leveraging the "Bokeh" library)

Here, we provide two choropleth maps concerning the Loan Book Value and the Loan Book Volume distribution across the U.S. States. To do so, we have used the "Bokeh" Python library, a GeoJSON file which defines the U.S. States boundaries and it has been produced from a cartographic boundary shapefile that is provided from the official site of the U.S. Census Bureau, and the Pandas DataFrame grouped_agg_df, where we aggregate the number, and the value of loans per U.S. State. "Bokeh" is a Python library for interactive D3 visualizations!


In [3]:
# Load the Data Set of interest
loan_df = pd.read_pickle(loan_df_path)

In [4]:
# A fast look in the available data set..
loan_df.info(null_counts=True)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 887379 entries, 0 to 887378
Data columns (total 74 columns):
id                             887379 non-null int64
member_id                      887379 non-null int64
loan_amnt                      887379 non-null float64
funded_amnt                    887379 non-null float64
funded_amnt_inv                887379 non-null float64
term                           887379 non-null category
int_rate                       887379 non-null float64
installment                    887379 non-null float64
grade                          887379 non-null category
sub_grade                      887379 non-null category
emp_title                      835922 non-null object
emp_length                     887379 non-null category
home_ownership                 887379 non-null category
annual_inc                     887375 non-null float64
verification_status            887379 non-null category
issue_d                        887379 non-null datetime64[ns]
loan_status                    887379 non-null object
pymnt_plan                     887379 non-null category
url                            887379 non-null object
desc                           126029 non-null object
purpose                        887379 non-null category
title                          887228 non-null object
zip_code                       887379 non-null category
addr_state                     887379 non-null category
dti                            887379 non-null float64
delinq_2yrs                    887379 non-null int64
earliest_cr_line               887350 non-null datetime64[ns]
inq_last_6mths                 887379 non-null int64
mths_since_last_delinq         887379 non-null int64
mths_since_last_record         887379 non-null int64
open_acc                       887379 non-null int64
pub_rec                        887379 non-null int64
revol_bal                      887379 non-null float64
revol_util                     886877 non-null float64
total_acc                      887379 non-null int64
initial_list_status            887379 non-null category
out_prncp                      887379 non-null float64
out_prncp_inv                  887379 non-null float64
total_pymnt                    887379 non-null float64
total_pymnt_inv                887379 non-null float64
total_rec_prncp                887379 non-null float64
total_rec_int                  887379 non-null float64
total_rec_late_fee             887379 non-null float64
recoveries                     887379 non-null float64
collection_recovery_fee        887379 non-null float64
last_pymnt_d                   869720 non-null datetime64[ns]
last_pymnt_amnt                887379 non-null float64
next_pymnt_d                   634408 non-null datetime64[ns]
last_credit_pull_d             887326 non-null datetime64[ns]
collections_12_mths_ex_med     887379 non-null int64
mths_since_last_major_derog    887379 non-null int64
policy_code                    887379 non-null category
application_type               887379 non-null category
annual_inc_joint               511 non-null float64
dti_joint                      509 non-null float64
verification_status_joint      887379 non-null category
acc_now_delinq                 887379 non-null int64
tot_coll_amt                   887379 non-null int64
tot_cur_bal                    817103 non-null float64
open_acc_6m                    887379 non-null int64
open_il_6m                     887379 non-null int64
open_il_12m                    887379 non-null int64
open_il_24m                    887379 non-null int64
mths_since_rcnt_il             887379 non-null int64
total_bal_il                   21372 non-null float64
il_util                        18617 non-null float64
open_rv_12m                    887379 non-null int64
open_rv_24m                    887379 non-null int64
max_bal_bc                     21372 non-null float64
all_util                       21372 non-null float64
total_rev_hi_lim               817103 non-null float64
inq_fi                         887379 non-null int64
total_cu_tl                    887379 non-null int64
inq_last_12m                   887379 non-null int64
dtypes: category(14), datetime64[ns](5), float64(27), int64(23), object(5)
memory usage: 418.9+ MB

In [5]:
# Compute the "Loan Book Amount & Volume" per "US State"
grouped = loan_df.groupby(by=['addr_state'])
grouped_agg = (grouped[['loan_amnt']].agg(np.sum)
               .rename(columns={'loan_amnt': 'loanbook_amnt_per_state'}))
grouped_agg['loanbook_vol_per_state'] = grouped['loan_amnt'].agg(np.count_nonzero)
grouped_agg_df = grouped_agg.reset_index()
grouped_agg_df.head()


Out[5]:
addr_state loanbook_amnt_per_state loanbook_vol_per_state
0 AK 3.697805e+07 2205.0
1 AL 1.646276e+08 11200.0
2 AR 9.341608e+07 6640.0
3 AZ 2.901101e+08 20412.0
4 CA 1.898145e+09 129517.0

In [6]:
# Prepare the "grouped_agg_df" Data Frame as a JSON file...
# This JSON file has been appropriately joined into the GeoJSON Data Source, "us_states_GeoJSON", that we use here.
grouped_agg_df[:5].to_json(orient='records')


Out[6]:
'[{"addr_state":"AK","loanbook_amnt_per_state":36978050.0,"loanbook_vol_per_state":2205.0},{"addr_state":"AL","loanbook_amnt_per_state":164627650.0,"loanbook_vol_per_state":11200.0},{"addr_state":"AR","loanbook_amnt_per_state":93416075.0,"loanbook_vol_per_state":6640.0},{"addr_state":"AZ","loanbook_amnt_per_state":290110100.0,"loanbook_vol_per_state":20412.0},{"addr_state":"CA","loanbook_amnt_per_state":1898145250.0,"loanbook_vol_per_state":129517.0}]'

2.1 Loan book Value by U.S. States


In [7]:
# Load the necessary libraries for the D3 Visualization
from bokeh.io import show, output_notebook
from bokeh.palettes import (
    YlOrRd9 as palette1, 
    YlGnBu9 as palette2)
from bokeh.plotting import figure
from bokeh.models import (
    GeoJSONDataSource, 
    LogColorMapper,
    HoverTool,
    LogTicker,
    ColorBar)

# Load the enriched GeoJSON Data Source, with the loanbook measures of interest
with open(us_states_GeoJSON, 'r') as f:
    geo_source = GeoJSONDataSource(geojson=f.read())

# Output the Choropleth Plots in Notebook
output_notebook()

# PROVIDE THE CHOROPLETH OF "LOAN BOOK AMOUNT PER STATE"
palette1.reverse()
color_mapper = LogColorMapper(palette=palette1, 
                              low=grouped_agg_df['loanbook_amnt_per_state'].min(),
                              high=grouped_agg_df['loanbook_amnt_per_state'].max())

# Define the figure "Tools" we want to make available
TOOLS = "pan, wheel_zoom, reset, hover, save"

# Plot the figure
# Define the figure dimensions and its general details
p = figure(title="Loan Book Value by U.S. States", tools=TOOLS,
           plot_width=960, plot_height=500, 
           x_range=(0, 960), y_range=(500, 0),
           x_axis_location=None, y_axis_location=None)
           
# Render the "Bokeh" patches in Glyph
p.patches('xs', 'ys', source=geo_source,
          fill_color={'field': "loanbook_amnt_per_state" ,'transform': color_mapper}, 
          fill_alpha=0.7, line_color="white", line_width=0.5)

# Add a Hover Tools over the U.S. States
hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
    ("State", "@state"),
    ("Loan Book Amount", "@loanbook_amnt_per_state{,.2f} USD"),
    ("(Long, Lat)", "($x, $y)"),
]

# Add a ColorBar Legend
color_bar = ColorBar(color_mapper=color_mapper, ticker=LogTicker(),
                     background_fill_alpha=0.7,
                     label_standoff=5, 
                     major_label_text_color='black', 
                     major_tick_line_color='black', major_tick_line_width=1.3, major_tick_out=5,
                     border_line_color=None, location=(0,0),
                     orientation='horizontal', width=500)
p.add_layout(color_bar, 'above')

show(p)


Loading BokehJS ...

2.2 Loan book Volume by U.S. States


In [8]:
# PROVIDE THE CHOROPLETH OF "LOAN BOOK VOLUME PER STATE"
palette2.reverse()
color_mapper = LogColorMapper(palette=palette2,
                              low=grouped_agg_df['loanbook_vol_per_state'].min(),
                              high=grouped_agg_df['loanbook_vol_per_state'].max())

# Define the figure "Tools" we want to make available
TOOLS = "pan, wheel_zoom, reset, hover, save"

# Plot the figure
# Define the figure dimensions and its general details
p = figure(title="Loan Book Volume by U.S. States", tools=TOOLS,
           plot_width=960, plot_height=500,
           x_range=(0, 960), y_range=(500, 0),
           x_axis_location=None, y_axis_location=None)

# Render the "Bokeh" patches in Glyph
p.patches('xs', 'ys', source=geo_source,
          fill_color={'field': "loanbook_vol_per_state" ,'transform': color_mapper}, 
          fill_alpha=0.7, line_color="white", line_width=0.5)

# Add a Hover Tools over the U.S. States
hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
    ("State", "@state"),
    ("Loan Book Volume", "@loanbook_vol_per_state{,}"),
    ("(Long, Lat)", "($x, $y)"),
]

# Add a ColorBar Legend
color_bar = ColorBar(color_mapper=color_mapper, ticker=LogTicker(),
                     background_fill_alpha=0.7,
                     label_standoff=5, 
                     major_label_text_color='black', 
                     major_tick_line_color='black', major_tick_line_width=1.3, major_tick_out=5,
                     border_line_color=None, location=(0,0),
                     orientation='horizontal', width=500)
p.add_layout(color_bar, 'above')

show(p)



In [ ]: